Top-k String Auto-Completion with Synonyms
نویسندگان
چکیده
Auto-completion is one of the most prominent features of modern information systems. The existing solutions of auto-completion provide the suggestions based on the beginning of the currently input character sequence (i.e. prefix). However, in many real applications, one entity often has synonyms or abbreviations. For example, “DBMS” is an abbreviation of “Database Management Systems”. In this paper, we study a novel type of auto-completion by using synonyms and abbreviations. We propose three trie-based algorithms to solve the top-k auto-completion with synonyms; each one with different space and time complexity trade-offs. Experiments on large-scale datasets show that it is possible to support effective and efficient synonym-based retrieval of completions of a million strings with thousands of synonyms rules at about a microsecond per-completion, while taking small space overhead (i.e. 160-200 bytes per string). The source code of our experiments can be download at http://udbms.cs.helsinki.fi/?projects/autocompletion/download.
منابع مشابه
Space-efficient data structures for Top-k completion
Virtually every modern search application, either desktop, web, or mobile, features some kind of query auto-completion. In its basic form, the problem consists in retrieving from a string set a small number of completions, i.e. strings beginning with a given prefix, that have the highest scores according to some static ranking. In this paper, we focus on the case where the string set is so larg...
متن کاملWiden the Peepholes! Entity-based Auto-Suggestion as a rich and yet immediate Starting Point for Exploratory Search
Today’s search engines provide instant keyword-based auto-suggestion and completion of the user’s search queries. This paper presents a novel auto-suggestion interface for the Semantic Multimedia Explorer (SEMEX), a semantic search engine that supports entity-based exploratory video retrieval. In difference to traditional textbased retrieval, auto-suggestion and auto-completion of the user’s qu...
متن کاملPiecewise Synonyms for Enhanced UMLS Source Terminology Integration
The UMLS contains more than 100 source vocabularies and is growing via the integration of others. When integrating a new source, the source terms already in the UMLS must first be found. The easiest approach to this is simple string matching. However, string matching usually does not find all concepts that should be found. A new methodology, based on the notion of piecewise synonyms, for enhanc...
متن کاملPredicting Source Code Effectiveness of Prediction based Source Code Auto Completion
Auto Completion is the facility provided by most modern Integrated Development Environments and source code editors for word completion when editing source code. All auto completion mechanisms that we know of use syntactic knowledge of the programming language to provide this feature. We investigate the use of programming language agnostic prediction models to provide auto completion. We implem...
متن کاملAn Algorithm for Hypergraph Completion According to Hyperedge Replacement Grammars
The algorithm of Cocke, Younger, and Kasami is a dynamic programming technique well-known from string parsing. It has been adopted to hypergraphs successfully by Lautemann. Therewith, many practically relevant hypergraph languages generated by hyperedge replacement can be parsed in an acceptable time. In this paper we extend this algorithm by hypergraph completion: If necessary, appropriate fre...
متن کامل